Este projeto tem como objetivo documentar o processo de escolha de modelo de machine learning, demonstrando que até mesmo modelos simples podem performar bons resultados quando aplicados de maneira coerente.

Para isso, utilizaremos um banco de dados com poucas observações e utilizaremos o método Grid Search para seleção de hiper parâmetros e o método Leave One Out para validação dos modelos. E a metodologia a ser abordada será o KNN (K Vizinhos mais próximos), que possui uma teoria simples por trás do algoritmo, e não há possíveis “empecilhos” em razão de pressupostos de distribuição.

Com isto estabelecido, podemos iniciar nossa análise!

Preparando o terreno

Bibliotecas e semente

require(kknn)
require(tidyverse)
require(GGally)
require(plotly)
set.seed(2100)

Funções

Para não “poluir” o corpo do código, iremos definir algumas funções para utilizá-las posteriormente, facilitando a interpretação futura. São elas:

  • Criar um data frame vazio com as colunas para identificar cada modelo criado
  • Popular um data frame já existente, informando a posição e valores
  • Calcular a acurácia
  • Criar combinações agrupando variáveis estabelecidas
# Criar data frame
df_vazio <- function(){
  df <- tibble(
    K           = integer(),
    Distancia   = integer(),
    Peso        = character(),
    Var_ID      = integer(),
    Score       = numeric()
  )
  return(df)
} 
# Popular data frame
popular_df <- function(df, contador, k, dist, peso, var, score){
  df[contador, "K"]           <- k
  df[contador, "Distancia"]   <- dist
  df[contador, "Peso"]        <- peso
  df[contador, "Var_ID"]      <- var
  df[contador, "Score"]       <- score
  
  return(df)
}
# Calcular a Acurácia
calcular_acc <- function(teste, fitted, variavel){
  real <- pull(teste[,variavel])
  acc <- ((sum(real == fitted))/length(fitted))
  
  return(acc)
}
# Criar combinações
calcular_combinacoes <- function(df, posicoes){
  df  <- df[,posicoes]
  col <- names(df)
  N <- length(col)
  variaveis <- list()
  index <- 0
  for(i in 1:N){
    index <- index + 1
    variaveis[[index]] <- combn(x = col, i)
  }
  
  variaveis_2 <- list()
  index <- 0
  N <- length(variaveis)
  for(i in 1:N){
    M <- ncol(variaveis[[i]])
    for(j in 1:M){
    index <- index + 1
    variaveis_2[[index]] <- variaveis[[i]][,j]
    }
  }
  
  return(variaveis_2)
}

Dados

Os dados a serem utilizados, são de domínio público e podem ser obtidos na biblioteca carData, sob nome de “National Statistics From The United Nations, Mostly From 2009–2011”.

Possui informações referentes à 213 lugares, como saúde, bem-estar e educação.

O intuito deste projeto não é lidar com dados não balanceados, portanto iremos pegar dados somente das regiões Africa e Asia, que possuem quantidade semelhante de observações.

Como também não temos uma grande quantidade de features, não utilizaremos nenhum método de seleção de features mais sofisticado, iremos olhar apenas para a performance histórica de cada modelo.

head(carData::UN) %>%
  print()
summary(carData::UN$region) %>% 
  print()
       Africa          Asia     Caribbean        Europe    Latin Amer North America NorthAtlantic       Oceania 
           53            50            17            39            20             2             1            17 
         NA's 
           14 
data <- carData::UN %>%
  as_tibble() %>%
  dplyr::filter(region %in% c('Africa', "Asia")) %>%
  dplyr::select(-group) %>%
  na.omit() %>%
  mutate(ID = 1:n())
print(data)
print(ggpairs(data, aes(colour = region)))

Podemos verificar na diagonal principal as densidades de cada variável para cada região, em que África é vermelho e Ásia é azul. Apenas no “olho” diria que lifeExpF é minha favorita para distinguir as regiões.

Separando os dados

Agora iremos dividir nossos dados em duas partes:

Temos 102 observações no total, utilizaremos 82 para criar o modelo, e 20 para testá-lo.

data_fora <- data %>%
  group_by(region) %>%
  sample_n(size = 10, replace = F)  # Pegando aleatoriamente 10 observações de cada região
print(data_fora)
data_model <- data %>%
  dplyr::filter(!(ID %in% data_fora$ID)) %>%
  sample_frac(size = 1) # Aqui estamos embaralhando os dados
print(data_model)
# Fold
qtd_fold <- nrow(data_model)
folds <- cut(1:nrow(data_model), breaks = qtd_fold, label = F)
data_model <- data_model %>%
  mutate(Fold = folds)
print(data_model)

Escolha do Modelo

Validação: Leave One Out

Como o próprio nome sugere: vamos deixar um de fora.

Para cada um dos 620 modelos existentes, iremos separar as 82 observações em 82 partes, ou seja, iremos treinar com 81 observações e testar com a outra restante. E assim, iremos calcular a acurácia média dessas 82 divisões. Simples, não?

Essa maneira de validar é controversa, pois é muito sucetível à outliers, porém como estamos trabalhando com um volume de dados bem pequeno, se torna inviável criar poucas divisões com maiores quantidades de dados, como por exemplo o K-Fold Validation com 3, 5 ou 10 partes. Desse modo o LOOCV (Leave One Out Cross Validation) se torna uma das maneiras de driblar a falta de dados.

Modelagem

Variáveis que serão populadas

O pacote kknn suporta diversas variações de kernel (que estamos chamando de peso) porém iremos utilizar apenas a padrão, que é a ‘retangular’. Recomendo fortemente olhar a documentação do pacote: https://www.rdocumentation.org/packages/kknn/versions/1.3.1/topics/kknn

dist <- c(1, 1.5, 2, 3)
peso <- 'rectangular'
kviz <- c(1, 3, 5, 7, 9)
acc <- c()
df <- df_vazio()
contador <- 1
variaveis <- calcular_combinacoes(df = data_model, posicoes = c(2:6))
print(head(variaveis, n = 5))
[[1]]
[1] "fertility"

[[2]]
[1] "ppgdp"

[[3]]
[1] "lifeExpF"

[[4]]
[1] "pctUrban"

[[5]]
[1] "infantMortality"
print(tail(variaveis, n = 5))
[[1]]
[1] "fertility"       "ppgdp"           "lifeExpF"        "infantMortality"

[[2]]
[1] "fertility"       "ppgdp"           "pctUrban"        "infantMortality"

[[3]]
[1] "fertility"       "lifeExpF"        "pctUrban"        "infantMortality"

[[4]]
[1] "ppgdp"           "lifeExpF"        "pctUrban"        "infantMortality"

[[5]]
[1] "fertility"       "ppgdp"           "lifeExpF"        "pctUrban"        "infantMortality"

Criação dos modelos

Esse pequeno bloco de código é a nossa cereja do bolo. Aqui contém toda a essência de nossa análise, populando todos os valores de parâmetros e sua respectiva performance para compararmos posteriormente.

for(var_id in 1:length(variaveis)){
  var <- variaveis[[var_id]]
  for(d in dist){
    for(p in peso){
      for(k in kviz){
        for(fold in 1:qtd_fold){
          train <- data_model %>%
            dplyr::filter(Fold != fold) %>%
            dplyr::select(region, var)
          
          test <- data_model %>%
            dplyr::filter(Fold == fold) %>%
            dplyr::select(region, var)
          
          predict <- kknn(region ~ ., train = train, test = test, k = k, kernel = p, distance = d, scale = T)
          
          acc[fold] <- calcular_acc(teste = test, fitted = predict$fitted.values, variavel = "region")
        }
        df <- popular_df(df = df, contador = contador, k = k, dist = d, peso = p, var = var_id, score = mean(acc))
        contador <- contador + 1
      }
    }
  }
}

Resultado

Checando as performances

df %>%
  arrange(desc(Score)) %>%
  print()
df %>%
  arrange(Score) %>%
  print()

Temos que o melhor modelo teve uma acurácia próxima de 90% e o pior, próximo de 60%. Uma diferença grotesca de performance. E obviamente, agora iremos utilizar os parâmetros do modelo com maior acurácia para testar nos dados que deixamos de fora da seção de modelagem.

Aplicando o modelo

escolhido <- df %>%
  arrange(desc(Score)) %>%
  slice(1)
print(escolhido)
k_escolhido    <- escolhido$K
peso_escolhido <- escolhido$Peso
dist_escolhido <- escolhido$Distancia
var_escolhido  <- variaveis[[escolhido$Var_ID]]
print(var_escolhido)
[1] "fertility" "ppgdp"     "lifeExpF" 
plot_ly(x = data_model$fertility, 
        y = data_model$ppgdp, 
        z = data_model$lifeExpF, 
        color = data_model$region,
        type = 'scatter3d')
No scatter3d mode specifed:
  Setting the mode to markers
  Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode
No scatter3d mode specifed:
  Setting the mode to markers
  Read more about this attribute -> https://plot.ly/r/reference/#scatter-mode

fit <- kknn(region~., 
          train = dplyr::select(data_model,region, var_escolhido), 
          test = data_fora, 
          k = k_escolhido, 
          kernel = peso_escolhido, 
          distance = dist_escolhido,
          scale = TRUE)
data.frame(True = data_fora$region,
           Predict = fit$fitted.values) %>%
  print()
table(fit$fitted.values == data_fora$region) %>%
  print()

FALSE  TRUE 
    1    19 
calcular_acc(teste = data_fora, fitted = fit$fitted.values, variavel = 'region') %>%
  print()
[1] 0.95

A validação com novos dados performou melhor do que o esperado. Porém é algo que sempre pode acontecer, assim como poderia performar pior do que esperamos. Afinal, por mais que tentemos criar o melhor modelo possível, trabalhamos com estatística e probabilidade, e não com bolas de cristal.

E assim concluímos nosso projeto. Vimos que os modelos mais simplistas também podem performar bons resultados, e não há motivo para se afobar quando for iniciar um projeto de Data Science. Começar pela solução mais complexa é meter os pés pelas mãos, pois há casos em que uma solução que apenas um Sênior conseguiria desenvolver perfomaria apenas um pouco melhor do que um (estagiário + google) com uma regressão logística ou KNN conseguiria fazer.

LS0tDQp0aXRsZTogIkNvbW8gdHVuYXIgc2V1IG1vZGVsbyBkZSBLTk4gZGUgZm9ybWEgc2ltcGxlcyINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQpFc3RlIHByb2pldG8gdGVtIGNvbW8gb2JqZXRpdm8gZG9jdW1lbnRhciBvIHByb2Nlc3NvIGRlIGVzY29saGEgZGUgbW9kZWxvIGRlIG1hY2hpbmUgbGVhcm5pbmcsIGRlbW9uc3RyYW5kbyBxdWUgYXTpIG1lc21vIG1vZGVsb3Mgc2ltcGxlcyBwb2RlbSBwZXJmb3JtYXIgYm9ucyByZXN1bHRhZG9zIHF1YW5kbyBhcGxpY2Fkb3MgZGUgbWFuZWlyYSBjb2VyZW50ZS4NCg0KUGFyYSBpc3NvLCB1dGlsaXphcmVtb3MgdW0gYmFuY28gZGUgZGFkb3MgY29tIHBvdWNhcyBvYnNlcnZh5/VlcyBlIHV0aWxpemFyZW1vcyBvIG3pdG9kbyBHcmlkIFNlYXJjaCBwYXJhIHNlbGXn428gZGUgaGlwZXIgcGFy4m1ldHJvcyBlIG8gbel0b2RvIExlYXZlIE9uZSBPdXQgcGFyYSB2YWxpZGHn428gZG9zIG1vZGVsb3MuIEUgYSBtZXRvZG9sb2dpYSBhIHNlciBhYm9yZGFkYSBzZXLhIG8gS05OIChLIFZpemluaG9zIG1haXMgcHLzeGltb3MpLCBxdWUgcG9zc3VpIHVtYSB0ZW9yaWEgc2ltcGxlcyBwb3IgdHLhcyBkbyBhbGdvcml0bW8sIGUgbuNvIGjhIHBvc3PtdmVpcyAiZW1wZWNpbGhvcyIgZW0gcmF6428gZGUgcHJlc3N1cG9zdG9zIGRlIGRpc3RyaWJ1aefjby4NCg0KQ29tIGlzdG8gZXN0YWJlbGVjaWRvLCBwb2RlbW9zIGluaWNpYXIgbm9zc2EgYW7hbGlzZSENCg0KDQojIyBQcmVwYXJhbmRvIG8gdGVycmVubw0KDQojIyMjIEJpYmxpb3RlY2FzIGUgc2VtZW50ZQ0KYGBge3J9DQpyZXF1aXJlKGtrbm4pDQpyZXF1aXJlKHRpZHl2ZXJzZSkNCnJlcXVpcmUoR0dhbGx5KQ0KcmVxdWlyZShwbG90bHkpDQpzZXQuc2VlZCgyMTAwKQ0KYGBgDQoNCiMjIyMgRnVu5/VlcyANClBhcmEgbuNvICJwb2x1aXIiIG8gY29ycG8gZG8gY/NkaWdvLCBpcmVtb3MgZGVmaW5pciBhbGd1bWFzIGZ1buf1ZXMgcGFyYSB1dGlsaXrhLWxhcyBwb3N0ZXJpb3JtZW50ZSwgZmFjaWxpdGFuZG8gYSBpbnRlcnByZXRh5+NvIGZ1dHVyYS4gU+NvIGVsYXM6DQoNCi0gQ3JpYXIgdW0gZGF0YSBmcmFtZSB2YXppbyBjb20gYXMgY29sdW5hcyBwYXJhIGlkZW50aWZpY2FyIGNhZGEgbW9kZWxvIGNyaWFkbw0KLSBQb3B1bGFyIHVtIGRhdGEgZnJhbWUgauEgZXhpc3RlbnRlLCBpbmZvcm1hbmRvIGEgcG9zaefjbyBlIHZhbG9yZXMNCi0gQ2FsY3VsYXIgYSBhY3Vy4WNpYQ0KLSBDcmlhciBjb21iaW5h5/VlcyBhZ3J1cGFuZG8gdmFyaeF2ZWlzIGVzdGFiZWxlY2lkYXMNCg0KYGBge3J9DQoNCiMgQ3JpYXIgZGF0YSBmcmFtZQ0KZGZfdmF6aW8gPC0gZnVuY3Rpb24oKXsNCiAgZGYgPC0gdGliYmxlKA0KICAgIEsgICAgICAgICAgID0gaW50ZWdlcigpLA0KICAgIERpc3RhbmNpYSAgID0gaW50ZWdlcigpLA0KICAgIFBlc28gICAgICAgID0gY2hhcmFjdGVyKCksDQogICAgVmFyX0lEICAgICAgPSBpbnRlZ2VyKCksDQogICAgU2NvcmUgICAgICAgPSBudW1lcmljKCkNCiAgKQ0KICByZXR1cm4oZGYpDQp9IA0KDQojIFBvcHVsYXIgZGF0YSBmcmFtZQ0KcG9wdWxhcl9kZiA8LSBmdW5jdGlvbihkZiwgY29udGFkb3IsIGssIGRpc3QsIHBlc28sIHZhciwgc2NvcmUpew0KICBkZltjb250YWRvciwgIksiXSAgICAgICAgICAgPC0gaw0KICBkZltjb250YWRvciwgIkRpc3RhbmNpYSJdICAgPC0gZGlzdA0KICBkZltjb250YWRvciwgIlBlc28iXSAgICAgICAgPC0gcGVzbw0KICBkZltjb250YWRvciwgIlZhcl9JRCJdICAgICAgPC0gdmFyDQogIGRmW2NvbnRhZG9yLCAiU2NvcmUiXSAgICAgICA8LSBzY29yZQ0KICANCiAgcmV0dXJuKGRmKQ0KfQ0KDQojIENhbGN1bGFyIGEgQWN1cuFjaWENCmNhbGN1bGFyX2FjYyA8LSBmdW5jdGlvbih0ZXN0ZSwgZml0dGVkLCB2YXJpYXZlbCl7DQogIHJlYWwgPC0gcHVsbCh0ZXN0ZVssdmFyaWF2ZWxdKQ0KICBhY2MgPC0gKChzdW0ocmVhbCA9PSBmaXR0ZWQpKS9sZW5ndGgoZml0dGVkKSkNCiAgDQogIHJldHVybihhY2MpDQp9DQoNCiMgQ3JpYXIgY29tYmluYef1ZXMNCmNhbGN1bGFyX2NvbWJpbmFjb2VzIDwtIGZ1bmN0aW9uKGRmLCBwb3NpY29lcyl7DQogIGRmICA8LSBkZlsscG9zaWNvZXNdDQogIGNvbCA8LSBuYW1lcyhkZikNCiAgTiA8LSBsZW5ndGgoY29sKQ0KICB2YXJpYXZlaXMgPC0gbGlzdCgpDQogIGluZGV4IDwtIDANCiAgZm9yKGkgaW4gMTpOKXsNCiAgICBpbmRleCA8LSBpbmRleCArIDENCiAgICB2YXJpYXZlaXNbW2luZGV4XV0gPC0gY29tYm4oeCA9IGNvbCwgaSkNCiAgfQ0KICANCiAgdmFyaWF2ZWlzXzIgPC0gbGlzdCgpDQogIGluZGV4IDwtIDANCiAgTiA8LSBsZW5ndGgodmFyaWF2ZWlzKQ0KICBmb3IoaSBpbiAxOk4pew0KICAgIE0gPC0gbmNvbCh2YXJpYXZlaXNbW2ldXSkNCiAgICBmb3IoaiBpbiAxOk0pew0KICAgIGluZGV4IDwtIGluZGV4ICsgMQ0KICAgIHZhcmlhdmVpc18yW1tpbmRleF1dIDwtIHZhcmlhdmVpc1tbaV1dWyxqXQ0KICAgIH0NCiAgfQ0KICANCiAgcmV0dXJuKHZhcmlhdmVpc18yKQ0KfQ0KYGBgDQoNCiMjIyMgRGFkb3MNCk9zIGRhZG9zIGEgc2VyZW0gdXRpbGl6YWRvcywgc+NvIGRlIGRvbe1uaW8gcPpibGljbyBlIHBvZGVtIHNlciBvYnRpZG9zIG5hIGJpYmxpb3RlY2EgY2FyRGF0YSwgc29iIG5vbWUgZGUgIk5hdGlvbmFsIFN0YXRpc3RpY3MgRnJvbSBUaGUgVW5pdGVkIE5hdGlvbnMsIE1vc3RseSBGcm9tIDIwMDktLTIwMTEiLg0KDQpQb3NzdWkgaW5mb3JtYef1ZXMgcmVmZXJlbnRlcyDgIDIxMyBsdWdhcmVzLCBjb21vIHNh+mRlLCBiZW0tZXN0YXIgZSBlZHVjYefjby4NCg0KTyBpbnR1aXRvIGRlc3RlIHByb2pldG8gbuNvIOkgbGlkYXIgY29tIGRhZG9zIG7jbyBiYWxhbmNlYWRvcywgcG9ydGFudG8gaXJlbW9zIHBlZ2FyIGRhZG9zIHNvbWVudGUgZGFzIHJlZ2n1ZXMgQWZyaWNhIGUgQXNpYSwgcXVlIHBvc3N1ZW0gcXVhbnRpZGFkZSBzZW1lbGhhbnRlIGRlIG9ic2VydmHn9WVzLiANCg0KQ29tbyB0YW1i6W0gbuNvIHRlbW9zIHVtYSBncmFuZGUgcXVhbnRpZGFkZSBkZSBmZWF0dXJlcywgbuNvIHV0aWxpemFyZW1vcyBuZW5odW0gbel0b2RvIGRlIHNlbGXn428gZGUgZmVhdHVyZXMgbWFpcyBzb2Zpc3RpY2FkbywgaXJlbW9zIG9saGFyIGFwZW5hcyBwYXJhIGEgcGVyZm9ybWFuY2UgaGlzdPNyaWNhIGRlIGNhZGEgbW9kZWxvLg0KYGBge3J9DQpoZWFkKGNhckRhdGE6OlVOKSAlPiUNCiAgcHJpbnQoKQ0KDQpzdW1tYXJ5KGNhckRhdGE6OlVOJHJlZ2lvbikgJT4lIA0KICBwcmludCgpDQoNCmRhdGEgPC0gY2FyRGF0YTo6VU4gJT4lDQogIGFzX3RpYmJsZSgpICU+JQ0KICBkcGx5cjo6ZmlsdGVyKHJlZ2lvbiAlaW4lIGMoJ0FmcmljYScsICJBc2lhIikpICU+JQ0KICBkcGx5cjo6c2VsZWN0KC1ncm91cCkgJT4lDQogIG5hLm9taXQoKSAlPiUNCiAgbXV0YXRlKElEID0gMTpuKCkpDQoNCnByaW50KGRhdGEpDQoNCnByaW50KGdncGFpcnMoZGF0YSwgYWVzKGNvbG91ciA9IHJlZ2lvbikpKQ0KYGBgDQoNClBvZGVtb3MgdmVyaWZpY2FyIG5hIGRpYWdvbmFsIHByaW5jaXBhbCBhcyBkZW5zaWRhZGVzIGRlIGNhZGEgdmFyaeF2ZWwgcGFyYSBjYWRhIHJlZ2njbywgZW0gcXVlIMFmcmljYSDpIHZlcm1lbGhvIGUgwXNpYSDpIGF6dWwuIEFwZW5hcyBubyAib2xobyIgZGlyaWEgcXVlIGxpZmVFeHBGIOkgbWluaGEgZmF2b3JpdGEgcGFyYSBkaXN0aW5ndWlyIGFzIHJlZ2n1ZXMuDQoNCiMjIFNlcGFyYW5kbyBvcyBkYWRvcw0KQWdvcmEgaXJlbW9zIGRpdmlkaXIgbm9zc29zIGRhZG9zIGVtIGR1YXMgcGFydGVzOg0KDQotIE1vZGVsbzogZGFkb3MgcXVlIHNlcuNvIHV0aWxpemFkb3MgcGFyYSBjcmlh5+NvIGUgZXNjb2xoYSBkbyBwb3Nz7XZlbCBtZWxob3IgbW9kZWxvLiAoU2Vy4SBkaXZpZGlkbyBlbSB0cmVpbm8gZSB0ZXN0ZSBhaW5kYSkNCi0gRm9yYTogZGFkb3MgcXVlIG7jbyBzZXLjbyB1dGlsaXphZG9zIHBhcmEgY3JpYefjbyBkZSBuZW5odW0gbW9kZWxvLiBTZXLhIHV0aWxpemFkbyBwYXJhIHZlcmlmaWNhcm1vcyBhIHBlcmZvcm1hbmNlIGRvIG1vZGVsbyBlc2NvbGhpZG8gZW0gbm92b3MgZGFkb3MuDQoNClRlbW9zIDEwMiBvYnNlcnZh5/VlcyBubyB0b3RhbCwgdXRpbGl6YXJlbW9zIDgyIHBhcmEgY3JpYXIgbyBtb2RlbG8sIGUgMjAgcGFyYSB0ZXN04S1sby4NCmBgYHtyfQ0KDQpkYXRhX2ZvcmEgPC0gZGF0YSAlPiUNCiAgZ3JvdXBfYnkocmVnaW9uKSAlPiUNCiAgc2FtcGxlX24oc2l6ZSA9IDEwLCByZXBsYWNlID0gRikgICMgUGVnYW5kbyBhbGVhdG9yaWFtZW50ZSAxMCBvYnNlcnZh5/VlcyBkZSBjYWRhIHJlZ2njbw0KDQpwcmludChkYXRhX2ZvcmEpDQoNCmRhdGFfbW9kZWwgPC0gZGF0YSAlPiUNCiAgZHBseXI6OmZpbHRlcighKElEICVpbiUgZGF0YV9mb3JhJElEKSkgJT4lDQogIHNhbXBsZV9mcmFjKHNpemUgPSAxKSAjIEFxdWkgZXN0YW1vcyBlbWJhcmFsaGFuZG8gb3MgZGFkb3MNCg0KcHJpbnQoZGF0YV9tb2RlbCkNCg0KIyBGb2xkDQpxdGRfZm9sZCA8LSBucm93KGRhdGFfbW9kZWwpDQpmb2xkcyA8LSBjdXQoMTpucm93KGRhdGFfbW9kZWwpLCBicmVha3MgPSBxdGRfZm9sZCwgbGFiZWwgPSBGKQ0KDQpkYXRhX21vZGVsIDwtIGRhdGFfbW9kZWwgJT4lDQogIG11dGF0ZShGb2xkID0gZm9sZHMpDQoNCnByaW50KGRhdGFfbW9kZWwpDQoNCmBgYA0KDQojIyBFc2NvbGhhIGRvIE1vZGVsbw0KDQojIyMgRXNjb2xoYSBkb3MgaGlwZXIgcGFy4m1ldHJvczogR3JpZCBTZWFyY2gNCk5hZGEgbWFpcyDpIGRvIHF1ZSB1bWEgcGFsYXZyYSBib25pdGEgcGFyYSBzZSB0ZXN0YXIgdG9kYXMgYXMgY29tYmluYef1ZXMgcG9zc+12ZWlzIHBhcmEgb3MgdmFsb3JlcyBlc3RhYmVsZWNpZG9zLiBObyBub3NzbyBjYXNvLCBpcmVtb3MgdmFyaWFyIGFsZ3VucyB2YWxvcmVzIGRlIEssIGFsZ3VucyB2YWxvcmVzIHBhcmEgYSBkaXN04m5jaWEgZGUgTWlua293c2tpLCBlIGFzIHZhcmnhdmVpcyBlc2NvbGhpZGFzIHBhcmEgc2UgdHJlaW5hciBvIG1vZGVsby4NCg0KVG90YWxpemFuZG8gNjIwIG1vZGVsb3MgZGlzdGludG9zLg0KDQojIyMgVmFsaWRh5+NvOiBMZWF2ZSBPbmUgT3V0DQpDb21vIG8gcHLzcHJpbyBub21lIHN1Z2VyZTogdmFtb3MgZGVpeGFyIHVtIGRlIGZvcmEuIA0KDQpQYXJhIGNhZGEgdW0gZG9zIDYyMCBtb2RlbG9zIGV4aXN0ZW50ZXMsIGlyZW1vcyBzZXBhcmFyIGFzIDgyIG9ic2VydmHn9WVzIGVtIDgyIHBhcnRlcywgb3Ugc2VqYSwgaXJlbW9zIHRyZWluYXIgY29tIDgxIG9ic2VydmHn9WVzIGUgdGVzdGFyIGNvbSBhIG91dHJhIHJlc3RhbnRlLiBFIGFzc2ltLCBpcmVtb3MgY2FsY3VsYXIgYSBhY3Vy4WNpYSBt6WRpYSBkZXNzYXMgODIgZGl2aXP1ZXMuIFNpbXBsZXMsIG7jbz8NCg0KRXNzYSBtYW5laXJhIGRlIHZhbGlkYXIg6SBjb250cm92ZXJzYSwgcG9pcyDpIG11aXRvIHN1Y2V07XZlbCDgIG91dGxpZXJzLCBwb3LpbSBjb21vIGVzdGFtb3MgdHJhYmFsaGFuZG8gY29tIHVtIHZvbHVtZSBkZSBkYWRvcyBiZW0gcGVxdWVubywgc2UgdG9ybmEgaW52aeF2ZWwgY3JpYXIgcG91Y2FzIGRpdmlz9WVzIGNvbSBtYWlvcmVzIHF1YW50aWRhZGVzIGRlIGRhZG9zLCBjb21vIHBvciBleGVtcGxvIG8gSy1Gb2xkIFZhbGlkYXRpb24gY29tIDMsIDUgb3UgMTAgcGFydGVzLiBEZXNzZSBtb2RvIG8gTE9PQ1YgKExlYXZlIE9uZSBPdXQgQ3Jvc3MgVmFsaWRhdGlvbikgc2UgdG9ybmEgdW1hIGRhcyBtYW5laXJhcyBkZSBkcmlibGFyIGEgZmFsdGEgZGUgZGFkb3MuDQoNCiMjIE1vZGVsYWdlbQ0KDQojIyMjIFZhcmnhdmVpcyBxdWUgc2Vy428gcG9wdWxhZGFzDQpPIHBhY290ZSBra25uIHN1cG9ydGEgZGl2ZXJzYXMgdmFyaWHn9WVzIGRlIGtlcm5lbCAocXVlIGVzdGFtb3MgY2hhbWFuZG8gZGUgcGVzbykgcG9y6W0gaXJlbW9zIHV0aWxpemFyIGFwZW5hcyBhIHBhZHLjbywgcXVlIOkgYSAncmV0YW5ndWxhcicuIFJlY29tZW5kbyBmb3J0ZW1lbnRlIG9saGFyIGEgZG9jdW1lbnRh5+NvIGRvIHBhY290ZTogaHR0cHM6Ly93d3cucmRvY3VtZW50YXRpb24ub3JnL3BhY2thZ2VzL2trbm4vdmVyc2lvbnMvMS4zLjEvdG9waWNzL2trbm4NCmBgYHtyfQ0KDQpkaXN0IDwtIGMoMSwgMS41LCAyLCAzKQ0KcGVzbyA8LSAncmVjdGFuZ3VsYXInDQprdml6IDwtIGMoMSwgMywgNSwgNywgOSkNCg0KYWNjIDwtIGMoKQ0KZGYgPC0gZGZfdmF6aW8oKQ0KY29udGFkb3IgPC0gMQ0KdmFyaWF2ZWlzIDwtIGNhbGN1bGFyX2NvbWJpbmFjb2VzKGRmID0gZGF0YV9tb2RlbCwgcG9zaWNvZXMgPSBjKDI6NikpDQpwcmludChoZWFkKHZhcmlhdmVpcywgbiA9IDUpKQ0KcHJpbnQodGFpbCh2YXJpYXZlaXMsIG4gPSA1KSkNCmBgYA0KDQoNCiMjIyMgQ3JpYefjbyBkb3MgbW9kZWxvcw0KRXNzZSBwZXF1ZW5vIGJsb2NvIGRlIGPzZGlnbyDpIGEgbm9zc2EgY2VyZWphIGRvIGJvbG8uIEFxdWkgY29udOltIHRvZGEgYSBlc3PqbmNpYSBkZSBub3NzYSBhbuFsaXNlLCBwb3B1bGFuZG8gdG9kb3Mgb3MgdmFsb3JlcyBkZSBwYXLibWV0cm9zIGUgc3VhIHJlc3BlY3RpdmEgcGVyZm9ybWFuY2UgcGFyYSBjb21wYXJhcm1vcyBwb3N0ZXJpb3JtZW50ZS4NCmBgYHtyfQ0KZm9yKHZhcl9pZCBpbiAxOmxlbmd0aCh2YXJpYXZlaXMpKXsNCiAgdmFyIDwtIHZhcmlhdmVpc1tbdmFyX2lkXV0NCiAgZm9yKGQgaW4gZGlzdCl7DQogICAgZm9yKHAgaW4gcGVzbyl7DQogICAgICBmb3IoayBpbiBrdml6KXsNCiAgICAgICAgZm9yKGZvbGQgaW4gMTpxdGRfZm9sZCl7DQogICAgICAgICAgdHJhaW4gPC0gZGF0YV9tb2RlbCAlPiUNCiAgICAgICAgICAgIGRwbHlyOjpmaWx0ZXIoRm9sZCAhPSBmb2xkKSAlPiUNCiAgICAgICAgICAgIGRwbHlyOjpzZWxlY3QocmVnaW9uLCB2YXIpDQogICAgICAgICAgDQogICAgICAgICAgdGVzdCA8LSBkYXRhX21vZGVsICU+JQ0KICAgICAgICAgICAgZHBseXI6OmZpbHRlcihGb2xkID09IGZvbGQpICU+JQ0KICAgICAgICAgICAgZHBseXI6OnNlbGVjdChyZWdpb24sIHZhcikNCiAgICAgICAgICANCiAgICAgICAgICBwcmVkaWN0IDwtIGtrbm4ocmVnaW9uIH4gLiwgdHJhaW4gPSB0cmFpbiwgdGVzdCA9IHRlc3QsIGsgPSBrLCBrZXJuZWwgPSBwLCBkaXN0YW5jZSA9IGQsIHNjYWxlID0gVCkNCiAgICAgICAgICANCiAgICAgICAgICBhY2NbZm9sZF0gPC0gY2FsY3VsYXJfYWNjKHRlc3RlID0gdGVzdCwgZml0dGVkID0gcHJlZGljdCRmaXR0ZWQudmFsdWVzLCB2YXJpYXZlbCA9ICJyZWdpb24iKQ0KICAgICAgICB9DQogICAgICAgIGRmIDwtIHBvcHVsYXJfZGYoZGYgPSBkZiwgY29udGFkb3IgPSBjb250YWRvciwgayA9IGssIGRpc3QgPSBkLCBwZXNvID0gcCwgdmFyID0gdmFyX2lkLCBzY29yZSA9IG1lYW4oYWNjKSkNCiAgICAgICAgY29udGFkb3IgPC0gY29udGFkb3IgKyAxDQogICAgICB9DQogICAgfQ0KICB9DQp9DQoNCmBgYA0KDQojIyBSZXN1bHRhZG8NCg0KDQojIyMjIENoZWNhbmRvIGFzIHBlcmZvcm1hbmNlcw0KYGBge3J9DQpkZiAlPiUNCiAgYXJyYW5nZShkZXNjKFNjb3JlKSkgJT4lDQogIHByaW50KCkNCg0KZGYgJT4lDQogIGFycmFuZ2UoU2NvcmUpICU+JQ0KICBwcmludCgpDQpgYGANCg0KVGVtb3MgcXVlIG8gbWVsaG9yIG1vZGVsbyB0ZXZlIHVtYSBhY3Vy4WNpYSBwcvN4aW1hIGRlIDkwJSBlIG8gcGlvciwgcHLzeGltbyBkZSA2MCUuIFVtYSBkaWZlcmVu52EgZ3JvdGVzY2EgZGUgcGVyZm9ybWFuY2UuDQpFIG9idmlhbWVudGUsIGFnb3JhIGlyZW1vcyB1dGlsaXphciBvcyBwYXLibWV0cm9zIGRvIG1vZGVsbyBjb20gbWFpb3IgYWN1cuFjaWEgcGFyYSB0ZXN0YXIgbm9zIGRhZG9zIHF1ZSBkZWl4YW1vcyBkZSBmb3JhIGRhIHNl5+NvIGRlIG1vZGVsYWdlbS4NCg0KIyMjIyBBcGxpY2FuZG8gbyBtb2RlbG8NCmBgYHtyfQ0KZXNjb2xoaWRvIDwtIGRmICU+JQ0KICBhcnJhbmdlKGRlc2MoU2NvcmUpKSAlPiUNCiAgc2xpY2UoMSkNCnByaW50KGVzY29saGlkbykNCg0Ka19lc2NvbGhpZG8gICAgPC0gZXNjb2xoaWRvJEsNCnBlc29fZXNjb2xoaWRvIDwtIGVzY29saGlkbyRQZXNvDQpkaXN0X2VzY29saGlkbyA8LSBlc2NvbGhpZG8kRGlzdGFuY2lhDQp2YXJfZXNjb2xoaWRvICA8LSB2YXJpYXZlaXNbW2VzY29saGlkbyRWYXJfSURdXQ0KcHJpbnQodmFyX2VzY29saGlkbykNCg0KDQpwbG90X2x5KHggPSBkYXRhX21vZGVsJGZlcnRpbGl0eSwgDQogICAgICAgIHkgPSBkYXRhX21vZGVsJHBwZ2RwLCANCiAgICAgICAgeiA9IGRhdGFfbW9kZWwkbGlmZUV4cEYsIA0KICAgICAgICBjb2xvciA9IGRhdGFfbW9kZWwkcmVnaW9uLA0KICAgICAgICB0eXBlID0gJ3NjYXR0ZXIzZCcpDQoNCg0KZml0IDwtIGtrbm4ocmVnaW9ufi4sIA0KICAgICAgICAgIHRyYWluID0gZHBseXI6OnNlbGVjdChkYXRhX21vZGVsLHJlZ2lvbiwgdmFyX2VzY29saGlkbyksIA0KICAgICAgICAgIHRlc3QgPSBkYXRhX2ZvcmEsIA0KICAgICAgICAgIGsgPSBrX2VzY29saGlkbywgDQogICAgICAgICAga2VybmVsID0gcGVzb19lc2NvbGhpZG8sIA0KICAgICAgICAgIGRpc3RhbmNlID0gZGlzdF9lc2NvbGhpZG8sDQogICAgICAgICAgc2NhbGUgPSBUUlVFKQ0KDQpkYXRhLmZyYW1lKFRydWUgPSBkYXRhX2ZvcmEkcmVnaW9uLA0KICAgICAgICAgICBQcmVkaWN0ID0gZml0JGZpdHRlZC52YWx1ZXMpICU+JQ0KICBwcmludCgpDQoNCnRhYmxlKGZpdCRmaXR0ZWQudmFsdWVzID09IGRhdGFfZm9yYSRyZWdpb24pICU+JQ0KICBwcmludCgpDQoNCmNhbGN1bGFyX2FjYyh0ZXN0ZSA9IGRhdGFfZm9yYSwgZml0dGVkID0gZml0JGZpdHRlZC52YWx1ZXMsIHZhcmlhdmVsID0gJ3JlZ2lvbicpICU+JQ0KICBwcmludCgpDQoNCmBgYA0KQSB2YWxpZGHn428gY29tIG5vdm9zIGRhZG9zIHBlcmZvcm1vdSBtZWxob3IgZG8gcXVlIG8gZXNwZXJhZG8uIFBvcultIOkgYWxnbyBxdWUgc2VtcHJlIHBvZGUgYWNvbnRlY2VyLCBhc3NpbSBjb21vIHBvZGVyaWEgcGVyZm9ybWFyIHBpb3IgZG8gcXVlIGVzcGVyYW1vcy4gQWZpbmFsLCBwb3IgbWFpcyBxdWUgdGVudGVtb3MgY3JpYXIgbyBtZWxob3IgbW9kZWxvIHBvc3PtdmVsLCB0cmFiYWxoYW1vcyBjb20gZXN0YXTtc3RpY2EgZSBwcm9iYWJpbGlkYWRlLCBlIG7jbyBjb20gYm9sYXMgZGUgY3Jpc3RhbC4NCg0KRSBhc3NpbSBjb25jbHXtbW9zIG5vc3NvIHByb2pldG8uIFZpbW9zIHF1ZSBvcyBtb2RlbG9zIG1haXMgc2ltcGxpc3RhcyB0YW1i6W0gcG9kZW0gcGVyZm9ybWFyIGJvbnMgcmVzdWx0YWRvcywgZSBu428gaOEgbW90aXZvIHBhcmEgc2UgYWZvYmFyIHF1YW5kbyBmb3IgaW5pY2lhciB1bSBwcm9qZXRvIGRlIERhdGEgU2NpZW5jZS4gQ29tZedhciBwZWxhIHNvbHXn428gbWFpcyBjb21wbGV4YSDpIG1ldGVyIG9zIHDpcyBwZWxhcyBt429zLCBwb2lzIGjhIGNhc29zIGVtIHF1ZSB1bWEgc29sdefjbyBxdWUgYXBlbmFzIHVtIFPqbmlvciBjb25zZWd1aXJpYSBkZXNlbnZvbHZlciBwZXJmb21hcmlhIGFwZW5hcyB1bSBwb3VjbyBtZWxob3IgZG8gcXVlIHVtIChlc3RhZ2nhcmlvICsgZ29vZ2xlKSBjb20gdW1hIHJlZ3Jlc3PjbyBsb2ftc3RpY2Egb3UgS05OIGNvbnNlZ3VpcmlhIGZhemVyLg0K